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ABSTRACT 

This  paper  is  concerned  with  the  determination  of  regression  functions 
from  only  a  parti'al  characterization  of  the  joint  distribution.  It  is 
shown  that  statistical  information  consisting  of  various  moments  and  joint 
moments  is  sufficient  to  characterize  a  regression  function.  An  applica¬ 
tion  to  regression  functionals  is  also  considered. 

I .  INTRODUCTION 

Let  X  and  Y  be  random  variables  with  Y  integrable,  i.e.  E{|Y|)  <  <*>, 
and  consider  the  regression  function  of  Y  on  X, 

m(x)  =  E{Y | X=x} . 

As  is  well  known,  m(-)  is  a  Borel  measurable  function,  and  It  frequently 
arises  in  engineering  applications.  For  example,  if  Y  is  a  second  order 
random  variable,  then  the  minimum  mean  squared  error  estimate  of  Y  in 
terms  of  X  is  given  by  m(X)  [1,  pp.  77-78]. 

In  some  cases  m(.)  has  a  particularly  simple  form.  For  example,  if 
X  and  Y  are  jointly  Gaussian  with  respective  means  nu.  and  nr,,  respective 
2  2  X  Y 

variances  ox  >0  and  ,  and  correlation  coefficient  p,  then 

m(x)  =  ax  +  b,  (1) 
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where  a  =  (a^/a^)p  and  b  =  m^  -  an^.  However,  in  the  case  of  jointly 
Gaussian  random  variables,  m^,  m^,  a x>  Oyj  and  p  completely  determine  the 

bivariate  distribution  of  the  two  random  variables. 

In  more  general  cases,  the  question  arises  as  to  how  much  information 
about  the  bivariate  distribution  is  required  to  determine  the  regression 
function.  If  X  and  Y  are  two  second  order  random  variables  that  are 
separable  in  the  sense  of  Nuttall  [2],  then  the  regression  function  m(>) 
has  the  form  given  by  (1).  However,  knowing  that  two  second  order  random 
variables  are  separable  in  the  sense  of  Nuttall,  and  knowing  the  means, 
variances,  and  the  correlation  coefficient  is  not  sufficient  to  determine 
the  bivariate  distribution  of  the  two  random  variables.  Notice  that  any 
two  random  variables  whose  bivariate  characteristic  function  is  ellipti- 
cally  symmetric  are  separable  in  the  sense  of  Nuttall  [3]. 

As  we  have  seen,  there  exists  a  class  of  joint  distributions  such  that 
the  regression  function  can  be  determined  knowing  that  the  two  random 
variables  belong  to  that  class  and  also  knowing  means,  variances,  and  the 
correlation  coefficient.  However,  it  might  seem  reasonable  to  conjecture 
that  in  more  general  cases,  the  regular  conditional  distribution  [4]  of  Y 
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given  X=x  is  required.  Although  the  regular  conditional  distribution  of 
Y  given  X=x  is  sufficient  to  determine  m(x) ,  in  the  next  section  we  will 
see  that  it  is  never  necessary. 

In  this  paper  we  will  be  concerned  with  statistical  information  such 
that  there  can  be  only  one  regression  function  consistent  with  the  given 
statistical  information.  In  the  next  section  we  consider  the  regression 
of  Y  on  a  random  variable  and  then  on  a  random  vector.  Then  in  the 
following  section  we  consider  the  regression  functional,  that  is,  the 
regression  of  Y  on  a  random  process. 

II.  DEVELOPMENT 


Let  Y  be  a  second  order  random  variable,  let  X  be  a  random  variable 
with  compact  support,  and  let  m(-)  be  given  by  Eq .  (1).  Define  the  measure 
P  on  the  Borel  sets  of  ]R  by 

V (A)  =  P (X  €  A)  , 

and  let  ||  •  ||  denote  the  L2(p)  norm.  We  will  say  that  a  polynomial  has 

max  degree  N  if  the  degree  of  the  polynomial  is  no  greater  than  N.  We 
note  that  for  any  e  >  0,  if  N  is  sufficiently  large,  there  exists  a  poly¬ 
nomial  of  max  degree  N  P^(x)  such  that 

||  m  -  PN ||  <  e.  (2) 

That  is,  there  exists  a  continuous  function  h(-)  such  that  [5] 

||  m  -  h ||  <  c/2  , 

and  by  the  Weierstrass  Theorem  there  exists  a  polynomial  PN  of  max  degree 
N  with  N  sufficiently  large  such  that 

II  h  -  PN||  <  e/2  . 

Thus  Eq.  (2)  follows  by  the  triangle  inequality.  Hence  there  exists  a 
sequence  of  polynomials  PN(x)  such  that 

PN(x)  h-  m(x)  in  L2(y)  . 

Let  Q„(x)  be  the  polynomial  of  max  degree  N  that  is  closer  to  m(x)  (in 

L2(u))  than  any  other  polynomial  of  max  degree  N.  We  note  in  passing  that 

Q^(x)  is  uniquely  defined  a.e.  (u3  by  the  Projection  Theorem.  That  is, 

there  may  exist  more  than  one  representation  of  Q^(x)  (i.e.  with  different 

coefficients)  but  they  are  all  equal  a.e.  [p].  From  the  preceding 
observations,  we  have  that 

Qn(x)  -*•  m(x)  in  L2  [p  ]  . 

Express  the  polynomial  QN(x)  as 

N  1 

Qjj(x)  *  ^  (N)  xJ  . 

j-0 

It  follows  from  the  Projection  Theorem  that  the  a^ (N)  can  be  determined 
from  the  relation 
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I  N 

E  m(X)  -  ^aj  (N)  XJ 

'  j=0 

This  is  equivalent  to 


xk!  =  0, 


)  ’ 


k=0,  1,  2,...,  N  . 


E{XkY}  =  £V(N)  E(Xj+k},  k  =  0,  1,  2,...,  N  . 
j=0 

Thus  we  have  seen  that  from  a  knowledge  of 


(3) 


and 


E{X  }  ,  k  =  1,  2,  ... 


E{YX  }  ,  k  =  0,  1,  2,  ...  , 


we  can  construct  a  sequence  of  polynomials  Q  (x)  that  converge  in  L„(p)  to 
m(x).  N  2 

Now  let  X  be  an  arbitrary  random  variable.  Let  g  be  an  invertible 
Borel  measurable  function  whose  range  is  bounded.  Define  the  random 
variable  X  as  X  =  g(X),  and  the  measure  p  on  the  Borel  sets  of  ]R  by 
p(A)  =  P(X£A).  From  the  above  discussion,  we  see  that 

m(x)  =  E{Y|x  =  x} 

is  determined  a.e.[p]  by  the  quantities 
k  =  1,  2,  ... 


E{Xk}  , 


and 


E{YXk}  ,  k  =  0,  1,  2,  ... 


(4) 

(5) 


Let  Qn(x)  denote  the  polynomial  of  max  degree  N  constructed  in  the 
preceding  fashion.  Then 


Qn(x)  ->  m(x) 


in  L2(y)  . 


Notice  that  m(x)  =  ra[g(x)].  From  a  change  of  variables  result  [6,  p.  182], 
we  have  that 


/ 


[Q  (x)  -  m(x)]2  p (dx)  =  J  [Q  [g(x) ]  -  m(x)]2  p (dx)  . 
gOR)  N  »  N 

Therefore,  QN[g(x)]  ->  m(x)  in  L2(p). 

Now  we  will  remove  the  restriction  that  Y  be  second  order.  Assume 
that  Y  is  an  integrable  random  variable  and  let 


Gk<y> 


i: 


if  I y [  <  k 
if  |  y  |  >  k  . 


Then  Gk(Y)  is  a  second  order  random  variable  and  [1,  p.  23] 
E{Gk(Y)jx=x}  ->  E{Y|x=x}  a.e.fp]  . 

Since  j (Y) — Y |  | Y |  and  |y|  is  integrable,  we  have  that  E{Gk(Y)|x«x} 

m(x)  in  L^(p)  by  the  dominated  convergence  theorem  [6,  pp.  124-125]. 
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Thus  from  a  knowledge  of  the  quantities  in  Eqs.  (4)  and  (5)  we  can 
derive  a  sequence  of  estimates  for  E{Gk(Y)|x=x)  which  converges  in  L2 (p ) , 

and  consequently  in  L^y)  (see,  for  example,  [7]).  Also,  E{Gk(Y)|x=x) 

converges  to  E{y|x=x}  in  L^(y).  Thus,  by  a  straightforward  diagonalization 

procedure,  we  can  derive  a  sequence  of  estimates  which  converges  in  L^(y) 

to  m(x) .  These  results  are  summarized  in  the  following  theorem. 

Theorem  1 ;  Let  Y  be  an  integrable  random  variable,  let  X  be  an  arbitrary 
random  variable,  and  let  g  be  an  invertible  Borel  measurable  function 
mapping  the  reals  into  a  bounded  set.  Then  the  regression  function  m  is 
determined  a.e.[y]  by  the  quantities 

E([g(X)]k}  ,  k  =  1,  2,  ... 

and 

E{Y[g(X)]k}  ,  k  =  0,  1,  2,  ...  . 


Consider  for  the  moment  the  case  where  X  and  Y  are  independent.  In 
this  case  a  solution  to  Eq.  (3)  is  given  by 


aQ(N)  =  E{Y} 
aj(N)  =  0  ,  j  >  0  , 
and  we  get  that  m(x)  =  E{Y} . 

Now  consider  the  following  two  different  bivariate  density  functions: 


f  (x,y)  =  - exp 

/2tT  o 


_  (y-p*)' 

2a2 


T[o,i](x) 


f2(x,y)  xI[px-ifPX+l] (y)  I[0,l](x) 


where  o  >  0,  p €  (-1,1),  and  I  denotes  the  indicator  function.  Assuming 
that  the  density  of  (X,Y)  is  given  by  f^,  we  find  that 


E{Xk}  = 
E{ YXk} 


1 

k+1 

P 

=  k+2  ' 


In  this  case,  for  N  >_  1,  a  solution  to  Eq.  (3)  is  given  by 


ax(N)  =  p 

(6) 

v>  ■ 0  • J  * 1 • 

(7) 

and  we  conclude  that 
m(x)  =  px  . 

If  we  assume  that  the  density  of  (X,Y)  is  given  by  f2>  we  find  that 


(8) 
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E(YXk} 


2p 

k+3  * 


In  this  case,  for  N  ^  1,  Eqs.  (6)  and  (7)  still  satisfy  Eq.  (3),  and  the 
regression  function  is  once  again  given  by  Eq .  (8).  Thus,  in  this  example, 
the  two  pairs  of  marginal  densities  are  not  the  same,  the  conditional 
densities  of  Y  given  X=x  are  not  the  same,  and  the  moment  sequences  are 
not  the  same;  however,  the  moment  sequences  are  sufficient  to  characterize 
the  conditional  expectations,  which  are  identical.  Numerous  other  similar 
examples  may  easily  be  constructed. 

Now  we  will  consider  the  regression  of  Y  upon  a  set  of  random 
variables.  Let  X  be  an  arbitrary  random  vector  taking  values  in  ®n,  and 

let  y  be  defined  on  the  Borel  sets  of  Hn  by 
y (B)  =  P(XfB)  . 

Lemma  1 :  If  y  has  compact  support,  then  the  class  of  all  polynomials  is 
dense  in  L2(q). 


Proof :  Let  q  be  an  arbitrary  element  in  L^Cy).  For  any  e  >  0,  there 

exists  [5]  a  function  h:  ]Rn  -*  R  which  is  continuous  and  has  compact 
support  such  that 

||  q-h  ||  <  e/2  . 

By  the  Stone-Weierstrass  Theorem  [8]  there  exists  a  polynomial  p  in  n 
variables  such  that 

II  h-p  II  <  e/2  , 

and  thus  by  the  triangle  inequality 
II  P-q  II  <  e  . 

QED 


We  recall  that  the  degree  of  a  monomial  in  n  variables  is  the  sum  of 
the  powers  of  the  variables,  and  the  degree  of  a  polynomial  is  the  degree 
of  the  monomial  having  the  largest  degree  over  all  the  monomials  in  the 
polynomial  with  nonzero  coefficients.  There  are 


C<n,d)  -  (”+f1) 


monomials  of  degree  d  in  n  variables  [9], 

Assume  that  Y  is  a  second  order  random  variable,  and  define  m(x)  by 

Eq.  (1),  where  x  is  now  an  element  of  ]Rn  .  Assume  that  y  has  compact 
support.  Let  Q^Cx)  be  the  polynomial  of  max  degree  N  which  is  closer,  in 


the  L^Cy)  norm,  to  m(x)  than  any  other  polynomial  of  max  degree  N. 

Consider  a  monomial  in  n  variables  of  degree  d .  There  will  be 
C(n,d)  of  them.  Order  them  lexicographically  by  the  powers  of  the 
components  of  x,  and  let  m^Cx)  denote  the  j-th  monomial  of  degree  d. 

Then  QN(x)  can  be  expressed  as 


4 
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Qn(x)  = 


N  C(n,d) 

EE 


d=0  j=l 


ajd(N)mjd(x)  . 


It  follows  from  the  Projection  Theorem  that  the  coefficients  a  (N)  are 

jd 

given  by  the  solution  to  the  following  set  of  equations: 


N  C(n,d) 

E{Ymik(X)}  =  E  E  ajd(N)  E{mjd(X>  mik(x)i»  (9) 

d=0  j=l 


k  =  0,  1,  . 
satisfy  Eq. 


N  and  i  =  1,  ....  C(n,k).  If  the  coefficients  a 
(9),  then  it  follows  from  Lemma  1  that 


jd 


(N) 


Qn(x)  ->  m(x)  in  L2(u)  . 

Now  we  remove  the  assumption  that  X  has  compact  support  and  let  X  be 

an  arbitrary  random  vector  taking  values  in  lRn  .  Let  g  be  an  invertible 

Borel  measurable  function  mapping  lRn  into  a  bounded  subset  of  ]Rn  ,  and 
let  X  =  g(X) .  We  see  that 

m(x)  =  E{Y|X=x} 

is  determined  a.e.fy],  where  y(A)  =  y[g  1(A)],  by  the  quantities 
E{mjd(X)} 

and 

E{Ym  (X)} 

Jd 

for  d  =  0,  1,  2,  ...  and  j  =  1,  . ..,  C(n,d).  Let  QN(x)  be  the  polynomial 

of  max  degree  N  determined  in  the  preceding  fashion.  Then,  similar  to  the 
development  of  Theorem  1,  we  can  employ  a  change  of  variables  result 
[6,  p.  182]  to  conclude  that 

QNIg(x)]  m(x)  in  L2(u)  . 

A  chopping  argument  as  in  the  development  of  Theorem  1  allows  us  to  remove 
the  second  order  restriction  on  Y.  Then  a  straightforward  diagonalization 
procedure  results  in  a  sequence  of  estimates  which  converges  to  m(x)  in 
L^(p).  This  result  is  summarized  in  the  following  theorem. 


Theorem  2 :  Let  Y  be  an  integrable  random  variable,  let  X  be  an  arbitrary 

random  vector  taking  values  in  ®n  ,  and  let  g  be  an  invertible  Borel 

measurable  function  mapping  ]Rn  into  a  bounded  subset  of  ®n  ,  Then  the 
regression  function  m  is  determined  a.e.[y]  by  the  quantities 

E{mjdfg(X)]}  and  E{Ymjd[g (X) ] } 

for  d  *  0,  1,  2,  ...  and  j  ■  1,  ...,  C(n,d). 
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III.  REGRESSION  FUNCTIONALS 


As  before,  assume  that  Y  is  an  integrable  random  variable,  but  now 
let  T  be  an  infinite  subset  of  ]R  and  let  (X(t),  t(  T}  be  a  random 
process.  Let  S  denote  the  space  of  all  extended  real  valued  functions 
defined  on  T,  and  let£@(S)  denote  the  a-algebra  on  S  generated  by  the 
class  of  all  cylinders  in  S.  Let  3}  denote  the  Borel  sets  of  ]R  .  Then 
the  regression  functional 

m[x(t) ,  ter]  =  E{  Y|  X(t)  =  x(t),  t£T} 

is  a  measurable  function  from  (S,35(S))  to  (K  ,  33)  (see,  for  example, 

[10]). 

Let  p  be  the  measure  induced  on^(S)  by  {X(t),  t  €  T} .  That  is,  for 
any  cylinder  C  in  S,  p(C)  =  P({X(t),  t  €  T}  £  C)  ,  and  p  is  extended  to,35(S) 
via  Kolmogorov's  Theorem  (see,  for  example,  [11]). 

It  follows  from  [1,  pp .  21,  604]  that  there  exists  a  countable  subset 
of  T,  say  T  =  {t^,t  ,...},  depending  on  the  random  variable  Y,  such  that 

E{Y | X(t)  =  x ( t ) ,  t  £  T}  =  E{ Y j X(t)  =  x(t)  ,  t £ T}  a.e.[p]  . 


Let 

M  =  E{Y|X(t),  t  e  T} , 

M  =  E{Y|X(t1),  ....  X(t  )}, 
n  1  n 

&=  a{X(t)  ,  t  £T}, 


and 


3T  =  a{ X(t  ) ,  ...,  X(t  )}. 
n  1  n 

Then  from  the  properties  of  iterated  conditional  expectations  [1,  p.  37], 
it  follows  that 


E{M  ..  \&}  =  M 
n+1 1  n  n 


wpl 


and  hence  {M  ,  jT ,  n  >  1}  is  a  martingale.  It  follows  from  [1,  p.  332] 
n  n  — 

that  M 


M  wpl.  Since  E{ | | }  £  E{ | Y | }  <  ®,  it  follows  from  a  martingale 

0.  This  is 


convergence  theorem  [1,  p.  319]  due  to  Doob  that  E{ | M  -M| } 
equivalent  to 


E{Y|x(ti)  =  x(ti),  i=l,...,n)  -v  E[Y|x(t)  =  x(t),  t  £  T} 

in  L^(p).  Notice  that  Theorem  2  is  applicable  to  E{Y|x(t^)  =  x(t^), 

i=l,...,n}.  Thus  a  straightforward  diagonalization  procedure  results  in 
a  sequence  of  estimates  which  converges  to  m[x(t),  t  f T]  in  L^(p).  This 

result  is  summarized  in  the  following  theorem. 


Theorem  3 :  Let  Y  be  an  integrable  random  variable  and  let  {X(t),  t £ T} 
be  a  random  process.  Let  {g^,  n=l,2,...}  be  a  sequence  of  functions 

where  g^  is  an  invertible  Borel  measurable  function  from  ]Rn  to  a  bounded 

subset  of  Bn  .  Assume  that  for  all  positive  integers  n  and  for  all  sets 


J 


of  n  points  in  T,  say  t-^,  tn,  the  quantities 


E(mjd(gn[x(ti).  •••.  X(tn)])} 

and 


E{Ymjd(8n[X(tl)’  X(tn)D} 

for  d  =  0,  1,  2,  ...  and  j  =  1,  . ..,  C(n,d)  are  known.  Then  up  to  y 
equivalence,  there  is  only  one  possible  regression  functional  m[x(t).  t €  Tl 
=  E{Y|X(t)  =  x(t),  t  €  T} . 
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